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Abstract 

This document is the aggregation of several discussions oflL opes et al] (|2010l ) we submitted to 
the proceedings of the Ninth Valencia Meeting, held in Benidorm, Spain, on June 3-8, 2010, in 
conjunction with Hedibert Lopes' talk at this meeting. The main point in those discussions is the 
potential for degeneracy in the particle learning methodology, related with the exponential forgetting 
of the past simulations. We illustrate the resulting difficulties in the case of mixtures. 
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! 1 The case of mixtures (Mengersen, lacobucci and Robert) 

^ ■ In this dis cussion, we primarily consider the performances of the particle learning (PL) technique of 



'sj" ' iLopes et al.. (,2010.) in the specific case of mixtures of distributions. 

in 

Q ■ 1.1 Particle learning 

Reminiscing similar remarks made during Professor Poison's talk at the ISBA 2008 World meeting on 

■ Hamilton Island, we do not understand the purpose of the dismissal of MCMC methods found in the 
paper ("more for less", "direct approximations", &tc.) Convergence of MCMC methods has been the 
core activity of many top researchers in the past two decades, first and foremost Gareth Roberts and Jeff 

. ^ Rosenthal, whose work cannot be so casually ign ored! Especially when considering that, first, the main 

■ appeal in using particle methods ( Gordon et al.l . |1993) is in handling massive data flux at frequencies 



H I MCMC cannot face — and this stands quite separate from a convergence issue — and, secondly, the body 

^ of work produced by the authors as listed in the reference list does not include any in-depth study of the 

convergence properties of the PL method. 

As also argued in other discussions therein, the lack of warning in iLopes et al. I (l20ln^ or m previous 



papers about the unavoidable degeneracy of the method is more than puzzling, as the authors undoubt- 
edly are aware of this. The short paragraph about Monte Carlo errors contained in the current paper 
can be construed to be misleading in this regard since the Monte Carlo error Ct/\/n does not account 
for the resampling step. As demonstrated in the discussion by Robert and Ryder, the error may end up 
being 0(l/t) and miss the standard ^/n Monte Carlo convergence. The corpus of work thus produced 
so far seems to limit itself to the PL processing of an increasing sequence of state-space and dynamic 



*N. Chopin and CP. Robert are partly supported by the 2007-2010 grant ANR-07-BLAN-0237-01 "SP Bayes". J.- 
M. Marin and CP. Robert are partly supported by the 2009-2012 grant ANR-09-BLAN-0218 "Big'MC". Robin Ryder 
is funded by a postdoctoral fellowship from the Fondation des Sciences Mathematiques de Paris. Christian Schafer is 
supported by a PhD grant from CREST. 



1 



examples where the PL method does produce a reasonable output, but this series of case-studies does 
not constitute a sufficient validation in our eyes. (Some of the examples processed in the current paper 
are missing the hyperparameters chosen for their satisfactory resolution.) 

We note as a side remark that the argument found in Section 1.5 of the current paper about the 
difficulty about the improper prior p{6) being solved by using the mixture representation 



p{e) = / p{e\Zo)p{Zo)dZo 



is nonsensical as currently stated: if the marginal distribution of 9 is improper, so is the joint distribution. 
We also fail to understand where in the paper the authors manage to take a "new look at Bayes's 
theorem". If by this they mean the decomposition u sed in the first page of Section 1, this is a standard 
hidden Markov model property (jCappe et al.l . 120041) . 



1.2 Particle learning on mixtures 

In the case of a mixture of k Poisson distributions. 



f{x\uj,fl) = ^p^g{x\X^) , 
i=l 



taken as a first example in ICarvalho et"all ()2009f ). the integrated predictive p{yt+i\3t) can be obtained 



in closed form, as detailed below (since this derivation is central to our own Monte Carlo experiment). 
For Poisson mixtures, the "essential" auxiliary variable is 3t = {n{, . . . , s^, . . . , s^), where n* denotes 
the number of observations allocated to the i-th component and s* the sum of the observations allocated 
to component i (1 < i < fc). 

Thus, under a conjugate Dirichlet- Gamma prior assumption, using the delta function (5,-j and the 
notation 7. = 71 + • • • + 7/c , 



P(yt+i|3t)= / p{yt+i\e)pmt)de 



1 0—1 yt-\-i- I 



j^Vi-f.+t) r(7,+n* + l) T{a, + 4+yt+i) (A+n*)"-+^' 



^ r(7, + n*) j/-i + l!r(7. +t + l) r(a, + s*) (/3^ + n* + l)"-+^Hy*+i 
fe 

X V{p\{jj + n* + S,=j)) Y[ G{>^3\a3 + s] + 8,=jyt+i,pj + n) + Si^^i) dp dA 
i=i 

^r(7.+i) r(7.+n* + l) T{a,+s\+vt+i) (/3, + n*)"'+-^ 



^ r(7, + n*) y-t+ l!r(7. +i+ 1) T{a, + sf) (/3, + + 1)"'+^'+^*+! 
^y^ 7» + »* 1 r(a, + .*+yt+i) (/3. + n*)°-+^' 

^ives a closed form of the predictive distribution of yt+i given the sufficient (simulated) statistic 3t- 
Furthermore, the distribution of the first version of the auxiliary variable, zi, is easily derived, as 



"{zi ^ i\yi) « j Pj.9(yi|Aj)7r(p,A) dpdA 

oc j \y^+'''-\-^.(^+P.) pf-<^-\l- p^Y+T-^,-^ dp, dA, 

= r(yi + aj) (1 + /3,)-(^i+"^) r(i + 7,)r(i + 7. - 7,)/r(2 + 7.) 
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while the next aUocations can be found as 



^{zt+i =i|yt+i,3t) (xp{yt+i,kt+i = j\3t) = J p{yt+i,kt+i = i\0)i:{9\it) d6l 
^, (7j + n])T{a, + s\ + yt+i) + 

and si mulating the parameters o f the Poisson model given 3t is obvious, provided one uses a conjugate 
prior ( Diebolt and Robert . 1990l) . 

When applying both PL and MCMC techniques to a sample of size 10'' extracted from the Monte 
Carlo study detailed in the discussion by lacobucci et al., we obtain the output represented in Figure 
[T]for the posterior distributions of the A^'s. (Both PL and MCMC samples were re-ordered in terms of 
the Ai's. The plots are therefore the posterior distributions of the order statistics The discrepancy 

between both approaches is clear on this example. (We stress that this represents a "worst case" in the 
sense that the observations were chosen from the above-mentioned Monte Carlo experiment by selecting 
the sample producing the largest discrepancy in the evidence approximations. Random picks of samples 
from the Poisson mixture usually produce a better agreement.) 

2 On the approximation of evidence (lacobucci, Robert, Marin 
and Mengersen) 



In this discussion, we consider the performances of the particle learning (PL) technique in the specific 
setting of mixtures of distributions and for the approximation of the evidence 



aka the marginal likelihood. Through a simulation experiment, we examine how much th e degeneracy 
that is inherent to particle systems impacts this approximation (We refer the reader to Chen et ah , 
2000 | . for a general approach to the approximation of evidence and to both Chopin and Robert . .2010 , 
and Marin and Roberg . 20ld for illustrations in the particular setting of mixtures.) 



2.1 Approximation of the evidence 

In the case of a mixture of k Poisson distributions. 



f{x\u),^i) = ^pjg(a;|A,) 



taken as an example in iLopes et all (|2010t ). and studied in lCarvalho et al. I (l2009l) the integrated predictive 
can be obtained in closed form, as derived in the discussion of Mengersen et al. This implies that the 
product approximation to the evidence 



N 

r—1 i—1 



proposed in ICarvalho et al. (l2009l) and iLopes et al.l (I2OIOI) can be implemented here. We thus use 
the setting of Po isson mixtures to evaluate this PL approximation of the evidence and we re-evaluate 
Carvalho et al. 's (|2009h assessment that this "approach offers a simple and robust sequential Monte Carlo 
alternative to the traditionally hard problem of approximating marginal predictive densities via MCMC 
output". 

We note that, since the PL sample is considered as an approximate sample from the posterior 7r(p, Ajy*) 
it is possible to evaluate the evidence using Chibl 's (1995) formula rather than the above proposal of the 
authors. The availability of an alternative estimator of the evidence allows for a differenciation between 
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Figure 1: Comparison of the posterior distributions on the ordered A^'s produced by PL (left) and MCMC 
(right) for 10^ simulated observations from a 4 component Poisson mixture and 10'' particles/MCMC 
iterations. The curves are obtained by apply the R function density() to both samples. The true values 
are indicated at the top of each graph 



4 



the evaluation of approximation [of the target posterior distribution] resulting from the particle system 
(seen through a possible bias in Chib's, 1995, version) and the evaluation of the approximation [of the 
evidence] resulting from the use of the product marginal in iLooes et al. Thus, in contrast to the 



other discussions of ours, we evaluate here the specific degeneracy of the evidence approximation due to 
using a product of approximations. 

2.2 A Monte Carlo experimentation 

In order to evaluate the per f orman ces of the PL algorithm when compared with the vanilla Gibbs sampler 



( Diebolt and Robert , 1990l . ll994h . we simulated 250 samples of size 10^ from Poisson mixtures with 4 



and 5 components and with either widely spaced or close components, A = (10, 50, 110, 150, 180, 210) and 
A = (10, 15,20,25,30,35), respectively, and with slightly decreasing weights pi. We ran a 10^ iteration 
Gibbs sampler for Figures [2HS1 performing a further 10^ iterations as a check of the stability of the 



MC MC approx imation. (For Chib's approximation to perform correctly, as noted in lBerkhof et al.l . 12003 



and Marin and Robert, 2010l it is necessary to average over all k\ permutations of the component indices 



for both the original PL sample and the MCMC sample in order to escape label switching issues.) 

The first interesting outcome of our experiment is that the PL sample does not suffer from degeneracy 
for a small enough number of observations, since the ranges of the Chib's (2005) approximations for both 
PL and MCMC samples (represented by the second and third columns in the boxplots) are then the 
same. However, as predicted by the theory (see the discussions by Chopin and Robert, and by Robert 
and Ryder) , increasing the number of observations without simultaneously and exponentialy increasing 
the number of particles necessarily leads to the degeneracy of the simulated sufficient statistic paths. In 
our experiment, this degeneracy always occurs between 5, 000 and 10, 000 observations. The phenomenon 
clearly appears on Figures [2HS] where both the range and the extremes of the evidence approximations 
significantly differ on the right hand side boxplot graph. (Again, the stability of the MCMC range was 
tested by running the Gibbs sam pler for much longer a nd observing no variation.) This divergence is 
to be contrasted with Figure 1 in ICarvalho et al. I (l2009l) which concludes to an agreement between all 



approximations to the Bayes factor. 

The second result that is relevant for our discussion is that the new approximation to the evidence 
proposed by the authors suffers from a severe bias as one proceeds through the observations. This issue is 
apparently unrelated to the degeneracy phenomenon observed above in that the discrepa ncy starts from 



the beginning, the closest approximation occuring for n = 1,000 observations. Note that ICarvalho et al 
(12009,) mention that the evidence approximation based on particle learning was less variable. While this 
feature is not visible in our experiment, it is not necessarily a positive feature in any case, as shown 
in the current experiment. (In order to provide a better rendering of the comparison between the PL 
and the MCMC algorithms, we excluded the outliers from all boxplots. We however stress that both 
PL approaches had a higher propensity to outlying behaviour.) In the strongest case of discrepancy 
between PL and MCMC found in our experiment. Figure [6] illustrates the departure between the three 
approaches from a particularly influential observation, since the graphs are compared in terms of evidence 
per observation. 

We thus conclude at t he lack of ro bustness of the new approximation of evidence suggested in both 
Carvalho et al. ( 20091) and lLopes et al. ( 2010i) (besides providing a reinforced demonstration of the overall 



difficulty with degeneracy). 



3 Repeatability of the degeneracy (lacobucci, Marin and Robert) 



Following the floor discussion at the conference, we want to point out here that the divergence between 
the evidence evaluations observed in the discussion of lacobucci et al. is not the result of an outlying 
Monte Carlo experiment but indeed a distributional property. This can be seen on Figure [7] which 
reproduces the study of lacobucci et al. (in this set of comments) on the variation of the evidence in 
the speciflc setting of mixtures of Poisson distributions. For two given datasets, we repeated 683 times 
the three evidence approximations using the method proposed in Lopes et al. (2010), and Chib's (1995) 
method applied to both the PL and MCMC samples. The divergence between the three evaluations is 
consistent across simulations, so repeating simulations does not help in exhibiting this divergence. 
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Figure 2: Evolution against the number of observa tions (n = 100, n = 1,000 and n — 10,000) of the 
evidence approximation based on a PL sample and Lopes et al. ( 2010t ) approximation, on a PL sample 
and Chib's (1995) approximation, on an MCMC sample and Chib's (1995) approximation, for a particle 
population of size 10, 000, a mixture with 4 components and scale parameters A = (10, 50, 110, 150). 




Figure 3: Same caption as Figure [2] for A = (10, 15, 20, 25). 




Figure 4: Same caption as Figure [5] for 5 components. 
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Figure 5: Same caption as Figure [3] for 5 components. 
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Figure 6: Evolution of the three approximations of the evidence per observation against the number of 
observations for a specific sample simulated from the same Poisson mixture as in Figure [21 
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same data; ^.-{10, 30,1 10,150) - no outliers 
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nobs= 1 0000 



Figure 7: Range of the evidence approximation based on a PL sample and lLopes et al. approx- 
imation, on a PL sample and Chib's (1995) approximation, on an MCMC sample and Chib's (1995) 
approximation, for a particle population of size 10, 000, a mixture with 4 components and scale param- 
eters A = (10, 50, 110, 150), and 683 replications. 



4 On degeneracy (Chopin and Robert) 



In this discussion, we consider the performance of the particle learning technique of Lopes et al. ( 2010l ) 
in a limiting case, in order to illustrate the fact that a particle system cannot but degenerate, even when 
considering sufficient statistics Zt with fixed dimensions. 



4.1 Particle system degeneracy 



When ILopcs et al ] (HoIoD state that p{Z*\y*) is not of interest as the filtered, low dimensional p(Zt\y*) 
is sufficient for inference at time t, they seem to implicitely imply that the restriction of the simulation 
focu s to a low dimensional vector is a way to avoid the degeneracy inherent to all particle filters (see, 
e.g., iDel Moral et"d1 . l2006l) . However, the degeneracy of particle filters is an unavoidable consequence 
of the explosion of the state vector and the issue does not vanish because one is only interested in 
the marginal 



p{Zt\y') = I v{Z'\v') dZ- 



Indeed, as shown by the pseudo-code rendering in iLopes et al. ( 2010f ). the way PL produces a sample 
fromp(Zt|y*) is by sequentially simulating and by extracting Zt as the final output from this sequence. 
The PL algorithm therefore relies on an approximation of p(Z*|j/*) and the fact that this approximation 
quikckly degenerates as t increases, as discussed below and in the companion discussion by Robert and 
Ryder, obviously has an impact on the approximation of p{Zt\y*). 

Inherently, particle learning (PL) is at its core an auxil iary parti c le filt er ( Pitt and Shepha"rdl . 19991 ) 
applied in settings where there exists a sufficient statistic (jParmoisl . Il935) of reduced (or, even better , 
with fixed) dimension. The simulation scheme thus relies on resampling (|Rubinl . Il988l iKitagawal 119961 ) 
for adjusting the distribution of the current particle population to the new observation yt+i- Because 
of this continual resampling, the number of different values of Zp (p > 1) contributing to the sufficient 
statistic Zt (t > p) is decreasing in t at an exponential rate for a fixed p. Therefore, unless the size of the 
particle population exponentially increases with t (see Done et al.l 12002 . and the companion discussion 
by Chopin and Schafer), the sample of Zt's will not be distributed as an iid sample from p{Zt\y*'). The 
following section very clearly makes this point through a simple if representative example. 
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Figure 8: Evolution of the particle learning sample against the target distribution in terms of the number 
T — 50, . . . , 5000 of iterations, for a particle population of fixed size IC*. 

4.2 A simple particle learning example 

Consider the ultimate case where the zt's are completely independent from the observations yt, zt ~ 
A/'(0, 1), and where the empirical average of the ZtS is the sufficient statistic. In this setting, the PL 
algorithm simplifies into the following iteration t: 

1. Resample uniformly from {Z[, . . . , Z* ) to produce (3i, • ■ • , 3^); 

2. Generate zn ^ A/'(0, 1); 

3. Update = {t^\ + zu)/{t + 1) 

The target distribution of the (sufficient) empirical average 

= (zi + . . . + Zt)/t 

is obviously the normal A^(0, 1 /i) distribution. A straightforward simulation of the above particle system 
shows how quickly the degeneracy occurs in the sample: Figures [5HS1 show a complete lack of fit to the 
target distribution as early as t = 500 simulations when using 10, 000 particles. 



4.3 Conclusion 



The paoerlLopes et al 



2004. Del Moral et al 



(120101) fails to mention the well-documented issue of particle degeneracy (|CaPDe et al 
120061 ) ■ thus giving the impression that PL escapes this problem. Our simple ex- 



ample shows that a particle system cannot be expected to withstand an indeterminate increase in the 
number of observations without imposing a corresponding exponential increase in the particle size. 
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5 On the degeneracy of sufficient statistics (Robert, Ryder and 
Chopin) 



In connection with the discussion of Chopin and Robert, we det ail in this discussion how the degeneracy 
dynamics of the particle learning technique of lLopes et al. impacts the distribution of the sufficient 



(or "essential state ve ctor" ) statistics. 

Lopes et alJ ()2010l ) focus on the distribution of a sufficient statistic, p{Zt\y^), at time t. By insisting 



both on the low dimensionality of Zt and on the sufficiency, they give the reader the impression that 
the poor approximation of the state vector resulting from the resampling propagation scheme does 
not impact p{Zt\y^), since their statement "at time T , PL provides the filtered distribution of the last 
essential state vector Zt, namely p{ZT\y'^)" (Section 1.2) does not mention any deterioration in the 
approximation — this i s how we understand filte red — provided by PL. Because particle learning is inher- 
ently a particle filter ( Pitt and Shephardl . 1999t ). this intuition is unfortunately wrong, as shown below 



in the case of an empirical average of the past auxiliary variables Zt . Contrary to the belief that "resam- 
pling (...) is fundamental in avoiding a decay" (Section 1.2), resampling necessarily leads to degeneracy 
unless the size of the particle population increases exponentially with t. 

We thus consider again the case introduced by Chopin and Robert in their discussion, when the 
auxiliary variables Zt ~ A/'(0, 1) are independent from the observations yt and where the essential state 
vector statistic is the empirical average of the zt's. In this case, the distribution of the empirical average 

= (zi + . . . + Zt)/t 

is the normal A/'(0, 1/t) distribution, but the particle population degenerates into a single path from 
the point of view of this sufficient statistic. In other words, degeneracy occurs much faster than the 
root T forgetting of the past of the particle path that is due to the averaging. In order to support this 
perspective, we provide here a derivation of the variance of the particle population after t iterations. 
Using the same notations as in Chopin and Robert, since E[.^*] = 0, var(Z*) — 1/t and Z* — 
+ we consider 



t- 1 



2 



[3r^ = 3^'noi'f]+mi' ^ 3r']E[3*-^3r']) 



n Vl 1 n- 1 
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Now let Ut — t'^E.lZfZ*] — t + n. The last Ime becomes ut = ^^-^Ut_i. Since ui = n — 1, we have 



ut + t-n _ \n-l)+t 



— {{n-lY-n' +tn'-^} 



t~l n 



t-2 



2t n* 



+ 



0„(n-i). 



In conclusion, 



1 n(n — 1) 1 xt t t 1 

var(Zj^-+ ' ^^-^ {(n-l)*-n'+tn*-i} 



nt 
1 



n(n — 1) 
t 



{(1- 1 + t/n} 



For n fixed, and t — > +oo, tvar(Zj) — ^ 1, a limit that does not depend on n, i.e. the system eventually 
degenerates to a single path. If we set n = ct, then ?iivar(Zj) C, for some C > 0. Bearing in mind that 
the actual posterior variance should be 0(i"^), this means that, to bound the relative error uniformly 
over a given time interval, i.e. for t — 1, . . . T, one must take n — 0(T). 



6 On the degeneracy of path functionals in SMC (Chopin and 
Schafer) 



Much of the confu sion around the degeneracy of particle learning and similar algorithms ( Fearnheadl . 



2002LIStorvild . 120021 ) seems related to the lack of formal results regarding the degeneracy of path functional 



in Sequential Monte Carlo. We'd like to report here some preliminary investigation on this subject. 

Consider a standard state-space model, with observed process (yt), and hidden Markov process (xt), 
and a basic particle filter, which would track the complete trajectory xi-t, i.e. which would produce, at 
each iteration t, N simulated trajectories xj""*, with some weight w["\ so as to approximate p{xi;t\yi:t)- 
It is well known that the Monte Carlo error regarding the expectation of ip{xi;t) (a) remains bounded over 
time if ip{xi;t) = Xt, (the filtering problem), and (b) blows away, at an exponential rate, if ip{xi;t) = x\ 
(the smoothing problem). Chopin (2004) formalises these two statements by studying the asymptotic 
variance that appears in the central limit theorem for the corresponding particle estimates. 

As mentioned above, and to the best of our knowledge, there is currently no formal result on the 
divergence of the asymptotic variance for test functions like Lp{x\;t) = t^^ X]i=i i-^- some symmetric 
function with respect to the complete trajectory. (The fact that this function is a sufficient statistic 
should not play any role in t his convergence study.) One difficulty is that the iterative definition of the 
asymptotic variance given by IChopin (2004) leads to cumbersome calculations. 

We managed however to compute this asymptotic variance exactly, for the Gaussian local level model: 



Xt+i I a;t iV(x4, 1), yt\xt ^ N{xt,\) 



and the functional ^p(x\;t) — t^^ X]i=i ^i- this case, the asymptotic variance diverges at rate 0(e^*' jt^). 
Exact calculations may be requested from the authors. We plan to extend these results to a slightly 
more general model, e.g. with unknown variances, and a function ip which would be a sufficient statistic 
for such parameters. We conjecture that this exponential divergence occurs for many models: basically, 
in an average like ^[x\;t) — ^~^X]*=i^ii the Monte Carlo error attached to xxjt should be 0{e'^* /t^), 
and should dominate all the other terms. This is at least what one observes in toy examples. After, 
say, 100 iterations of a particle filter, the number of distinct values within all the simulated trajectories 
(that have survived so far) for the component xi is typically very small, and the degeneracy in the xi 
dimension seems sufficient to endanger the accuracy of any estimate based on the complete trajectory 
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7 Remarks on the rejoinder (Robert) 



Lopes et al. (2010) published a rejoinder on the discussions of their paper and the following is a detailed 
examination of the arguments found in this rejoinder, which requires a preliminary reading of the above 
papers as well as our discussion. (All quotes are taken verbatim from the rejoinder.) 

"Particle learning based on the product estimate and MCMC based on Chib 's formula produce rela- 
tively similar results either for small or large samples. " 
This statement about the estimation of the marginal likelihood (or the evidence) and the example 
A that is associated with it in the rejoinder thus comes to contradict our (rather intensive) simulation 
experiment which, as reported in the discussion (Section ^ , concludes to a strong bias in evidence 
approximation induced by using particle learning, whether or not the product estimator is used. We 
observed there that there were two levels of degeneracy, one due to the product solution (errors in a 
product being more prone to go and multiply) and one due to the particle nature of the sequential 
method (which does not refresh particles from earlier periods) . Figures [5HS] are at odds with the one 
presented in the rejoinder, maybe because we consider 10, 000 observations rather than 100. (I also fail 
to understand how the "Log-predictive (TRUE)" quantity is derived.) 

"Black-box sequential importance sampling algorithms and related central limit theorems are of little 
use in practice. " 

This is a quote from the rejoinder that is rather puzzling. There is not hing wrong with the central 
limit theorem which is the basis of error assessment in Monte Carlo studies ( Robert and Casella . 20091) . 



Indeed, one major consequence of the central limit theorem is that it provides a precise scale for the 
speed of convergence of Monte Carlo estimates and thus an indicator on the number of particles needed 
for a given precision level. The authors of the rejoinder then criticise our use of "1000 particles in 
5000 dimensional problems" as we "shouldn't be surprised at all with some of our findings". This is 
not factually exact since I find no trace in the discussion of such a case: we use 10, 000 particles in 
all examples and the target is either the distribution of the 4 mixture parameters, the evidence or the 
distribution of a one-dimensional sufficient statistic. Furthermore, these values of n and N are those used 
in their example D. More importantly, nor the paper neither the rejoinder map a practical strategy on 
how to increase the computational effort along with the number of observations. 

"This argument [that the Monte Carlo variance will 'blow-up 'J is incorrect and extremely misleading. " 
This point is central both to the discussions above and to the rejoinder, as the authors maintain 
that the inevitable particle degeneracy does not impact the distribution of the sufficient statistics. The 
argument about using time averages over particle paths rather than sums appears reasonable at first. 
Actually, taking an empirical average in almost stationary situations should produce an approximately 
normal distribution. With an asymptotic variance different from (thanks to the central limit theorem) 
However, this is not the main argument used in the discussions. Degeneracy in the particle path means 
that the early terms in the average are less and less diverse in the sample average. Therefore it is not 
that surprising that the variance is decreasing down to too small a value! As shown in Figure [5] above, 
degeneracy due to resampling may induce severe biases in the distribution of empirical averages while 
giving the impression of less variability (which is a recurrent argument in the rejoinder). Furthermore, 
the fact that parameters are simulated [rather than fixed] in the particle filter means that the process 
is not geometrically ergodic, hence that Monte Carlo errors tend to accumulate along iterations, rather 
than compensate. (This is why the comparison between PL and sampling importance resa mpling is par- 



ticula rly relevant, because it does not address this accumulation.) The rejoinder also quotes lOlsson et al] 



( 2008[) for ju s tifyin g the decrease in the Monte Carlo variance. This is somehow surprising in that (a) 



lOlsson et al. (j2008[) show that there is degeneracy without a fixed-lag smoothing and (b) they require 
a geometric forgetting property on the filtering dynamics. In addition, I think that Example E used to 
illustrate the point about variance reduction is not very appropriate for this issue because the hidden 
Markov chain is a Gaussian random walk, hence cannot be stationary (a fact noted by the authors). And 
once again a decrease in the "MC error" does not mean a converging algorithm because degeneracy nat- 
urally induces empirical variance decrease. (I also fail to see why the "prior" on [xt) is improper.) The 
final (if recurrent) argument that "PL parameters do not degenerate" is somehow puzzling: by nature, 
those parameters are simulated from a distribution conditional on the sufficient parameters. So obviously 
the simulated parameters all differ. But this does not mean that they are marginally distributed from 
the right distribution. 



12 



"MCMC schemes depend upon the not so trivial task of assessing convergence. How long should the 
bum-in Go be?" 

The rejoinder concludes with recommendations that sound more Hke a drafted to-do note the authors 
forgot to remove than an accumulation of true recommendations. (The above quote rather clearly 
supports our first point in the discussion.) It seems to me that the comparison between MCMC and 
particle filters is not particularly relevant, simply because particle filters apply in [sequential] settings 
where MCMC cannot be implemented. To try to promote PL over MCM by arguing that MCMC 
produces dependent draws while having convergence troubles is not needed (besides, PL also produces 
[unconditional] dependent draws). To advance that the Monte Carlo error for PL is of orser Ct/VN is 
not relevant either because Ct is exponential in T and because MCMC also has an error in 
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